A Novel Hybrid Quicksort Algorithm Vectorized using AVX-512 on Intel Skylake

نویسنده

Berenger Bramas

چکیده

The modern CPU’s design, which is composed of hierarchical memory and SIMD/vectorization capability, governs the potential for algorithms to be transformed into efficient implementations. The release of the AVX-512 changed things radically, and motivated us to search for an efficient sorting algorithm that can take advantage of it. In this paper, we describe the best strategy we have found, which is a novel two parts hybrid sort, based on the well-known Quicksort algorithm. The central partitioning operation is performed by a new algorithm, and small partitions/arrays are sorted using a branch-free Bitonicbased sort. This study is also an illustration of how classical algorithms can be adapted and enhanced by the AVX-512 extension. We evaluate the performance of our approach on a modern Intel Xeon Skylake and assess the different layers of our implementation by sorting/partitioning integers, double floatingpoint numbers, and key/value pairs of integers. Our results demonstrate that our approach is faster than two libraries of reference: the GNU C++ sort algorithm by a speedup factor of 4, and the Intel IPP library by a speedup factor of 1.4. Keywords—Quicksort; Bitonic; sort; vectorization; SIMD; AVX512; Skylake

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fast Sorting Algorithms using AVX-512 on Intel Knights Landing

متن کامل

Intel Skylake review

The Skylake Xeon (called Xeon Scalable processor) introduced a number of innovations, notably AVX-512 vectorization instructions capable of 8-wide double precision vectors (previous AVX2 had 4-wide DP vectors). This change in itself has a potential of doubling performance of floating point codes. Other changes include CPU core optimizations, rearchitecture of the caches, and new, mesh-based top...

متن کامل

Extreme Scale FMM-Accelerated Boundary Integral Equation Solver for Wave Scattering

Abstract. Algorithmic and architecture-oriented optimizations are essential for achieving performance worthy of anticipated energy-austere exascale systems. In this paper, we present an extreme scale FMM-accelerated boundary integral equation solver for wave scattering, which uses FMM as a matrix-vector multiplication inside the GMRES iterative method. Our FMM Helmholtz kernels are capable of t...

متن کامل

Landau Collision Integral Solver with Adaptive Mesh Refinement on Emerging Architectures

The Landau collision integral is an accurate model for the small-angle dominated Coulomb collisions in fusion plasmas. We investigate a high order accurate, fully conservative, finite element discretization of the nonlinear multi-species Landau integral with adaptive mesh refinement using the PETSc library (www.mcs.anl.gov/petsc). We develop algorithms and techniques to efficiently utilize emer...

متن کامل

Implementing BLAKE with AVX, AVX2, and XOP

In 2013 Intel will release the AVX2 instructions, which introduce 256-bit singleinstruction multiple-data (SIMD) integer arithmetic. This will enable desktop and server processors from this vendor to support 4-way SIMD computation of 64-bit add-rotate-xor algorithms, as well as 8-way 32-bit SIMD computations. AVX2 also includes interesting instructions for cryptographic functions, like any-to-a...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2017

A Novel Hybrid Quicksort Algorithm Vectorized using AVX-512 on Intel Skylake

نویسنده

چکیده

منابع مشابه

Fast Sorting Algorithms using AVX-512 on Intel Knights Landing

Intel Skylake review

Extreme Scale FMM-Accelerated Boundary Integral Equation Solver for Wave Scattering

Landau Collision Integral Solver with Adaptive Mesh Refinement on Emerging Architectures

Implementing BLAKE with AVX, AVX2, and XOP

عنوان ژورنال:

اشتراک گذاری